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SCALE FREE REDUCED RANK IMAGE ANALYSIS 



Suppose we have given an N x n data matrix X of 
raiik P. For convenience, we assume that the origins and 
scalings of the n variables are such that the n x n corre- 
lation matrix R is given by 

R = X*X (1) • 

To begin with, we make no assumptions about the rank of X 
or the relative values of N and n. In particular, we may 
have N n. We let D be an n x n, basic diagonal, scaling 
matrix, otherwise at present imspecified, and define 

W = XD (3) 

We let 

G = W'W (4) 

.Hence from (l), (3), and (4) 

G = DRD (5) 

The general problem of approximating the matrices X and 
W and their corresoonding covariance matrices R and G have 
been extensively treated over the years in books and articles 
on factor analysis techniques and applications, and the liter- 
ature is too voluminous and well known to require specific 
references. However, a distinction has been drawn and given 
major emphasis by some investigators between principal compon- 
ent analysis and factor analysis. Unfortunately, the distinc- 
tion has for the most part been discussed v/ith reference to the 
covariance matrices rather than the data matrices and, as a 



consequence, considerable confusion and controversy has re- 
sulted with reference to the implications of the distinction 
for the data matrices. Horst {1^7 69) has recently attempted 
to \mify the various proposed models of principal component 
and factor analysis by showing that they may be regarded as 
special cases of a more general approach which utilizes variable 
parameters for scaling and loss functions. As a procedure for 
reconciling the various proposed matrix approximation models, 
our generalized factor analysis appears to have some merit. 
However, it does fail to include as a special case an impor- 
tant model introduced by Guttman {195'Z ) and elaborated by 
Harris (/f62.) known as image analysis. The Guttman model has 
fundamental and important implications from the prediction 
point of view and, if one takes the position that prediction 
is the ultimate goal of science, these implications assume 
overriding importance. Perhaps the most important feature of 
the Guttman-Harris model has to do with t!ie particular trans- 
formation that is applied to the data matrix. This transfor- 
mation is such that each column of the transformed matrix is 
the best least squares estimate of the corresponding column 
of the data matrix as estimated from the remaining columns. 

As Harris has shown, the model can be generalized so that 
it is scale free, and this scale free model has interesting 
invariant properties with reference to the matrix of trans- 
formed variables. The model, however, has two serious limita- 
tions. First, it assujnes that the correlation or covariance 
n'atrix is basic. A necessary though not sufficient condition 
for this assumption to hold is that N n, or that the number 






of entities is greater than the number of variables. In many- 
important cases of actual data, this assumption is not satis- 
fied. 

In the second place, the model assumes there are no errors 
of measurement in a data matrix that samples some domain of 
entities and attributes. 

We shall develop a more generalized model of the image 
analysis type that is free of these two assumptions. We let 
V and u be n x m basic matrices where m < and define 
an n X n matrix B by 

. B = vu* (6) 

We let 

A = (Dg + I) (7) 

where 

Dg = diag (B) (8) 

Also we define 

Z = WB (9) 

and 

Y = W(B - Dg) (10) 

Prom equations (6) and (9) it is clear that the rank of 

Z cannot be greater than m. Prom equation (lO) it is clear 
that each column of Y is independent of the corresponding 
col-umn of W and hence also of X. This property of Y cor- 
responds to that; of the transformed data matrix in the tradi- 
tional image analysis model. In that model, however, the B 
matrix is taken as basic. 
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We next define an N x n residual matrix e by 



e = W - Y 

Prom equations (7)? (10), and (11) 
e = W(A - B) 



We let 



= tr e'e 



and 

= <^ + tr A d 



( 11 ) 

( 12 ) 

(13) 

(14) 



where d is a diagonal matrix of Lagrangian multipliers. We 
wish to minimize with the constraint (7). Therefore we 
take the symbolic derivative of ip with respect to the matrix 
V and equate to zero, vi^. 



(I4a) If. = 0 
Now from (6) and (7) 

(15) tr A d = tr (vu’d) + tr d 

Prom (4), (6), (12), (13), (14), and (15) 

. . (16) = tr(vu’Guv’ ~ vu'G4 "AGuv* + + vu*d + d) 

Prom (l4a) and (16) 



(17) u'Guv' ^ u*(GA - d) 
Prom (17) 



(18) V* = (u»Gu)“\i»(gA - d) 



Prom (6) and (18) 




(19) B = u(u»Gu)**^u»(GA " d) 



Let 



(20) S = u(u»Gu)"^u» 

From (ig) and (20) 

(21) B = SGA - Sd 

Suppose now we ass\nne that the scaling matrix D in 
(3) and (5) can be chosen so that B is symmetrical, that is, 

(22) B = B» 

A sufficient condition that (22) be satisfied is obviously 
that SG is symmetrical and that and d be scalar matrices, 
since by (20) S is symmetric. Suppose we indicate the basic 
sti*ucture of G by 

(23) G == Q 6 Q ♦ + Q 5 Q ' 
where 

(24) m + s 

A sufficient condition that SG be symmetrical is that 
• (25) u = c 

where c; is any m x m basic matrix. That (25) is also 
necessary is the case if we let contain any m nonvan- 
ishing roots of G, and the corresponding vectors. However, 

we shall take these to be the m largest and later justify 
this choice. Prom (20), (23), and (25) 
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(26) S = Q^« 



in m m 



(27) sa = Q^- 



Prom (21), (26), and (27) 

Now so far we have put no restrictions on D in (3) and 
(5). Let us assume that D can he determined so that 

(29) A = If 



where f is a scalar quantity. Because of (22), (28), and (29) 
v/e must also have 

(30) d = la 



where a is a scalar. From (28), (29), and (30) 

(31) B V * ^ 

From (7) and (29) 

. (32) Dg = (f - 1)1 

We shall now determine a as a function of f. From 

( 31 ) and ( 32 ) we have 

(' 3 .'I (f - l)n = mf - a tr 




From ( 33 ) 
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(34) a = 

m 

To determine f we first write from (29) f (31), and (34) 

(35) (A.- B).= (I - Q Q »)f + Q S Q' » 

ram' tr r5 

Prom (4), (12), and (13) 

.( v>) - tr(/A - B')G(A - B) 



Prom (22) and (36) 

(37) = tr('^-(A - B)^) 



Prom (35) 



(38) (A - B)= = (I - Q„, Q„*)f . «m V* « 



m 



Prom (23) and (38) 

\ (3a) G(A - b) 2 = (G - §,,,, Q,„.)f t (S-^-!^^2l£)2 

Prom (5), (37), and (39) 



st - (tf - tr S)f\ 

tr S 



m 



For convenience we constrain D so that 



(41) tr = n 



Prom (5), (23), and (41) 




(4-i.e) tr + tr 8g 



- n 



Since we wish to minimiae <i' , we set 



(42) 


^ - 0 
df “ ^ 


■From (40) 


i 

and ( 41 ) ' 


(4 3) 


0 - (n - tr S )f - 

* 31 , 


From (43) 




(.44-) 


P , n(n - m) 

til - tr Sj.^) tr -t- (n - m)^ 



Now let 



^14a) 


®,n ' 

^ ■ n - m 



It is clear then that is the mean of the n - m smallest. 



roots of 


G. From (44) and (45) 


(44b) 


. f s — 

ot tr + (n. - m) 



It is probably intuitively obvious that the solution (44) 
for f yields a minimum. For this to be the case we should 
have 



(45) 


^ > 0 . 


From (43) 
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(46) = (n “ tr B ) + (n - m)^ / tr S ^ 

^ f ^ jn *** 

Except for the limiting case of n = m, which we shall consider 
later, (46) does satisfy (45) since, because of (5), (23)? and 
(41), we must have n > tr 



If now we subcstitute (44) in (34), we get 
s n(n - tx- S j,j) 



Prom (44a), (44b), and (47) 

(48) a = off 

Substituting (48) in (31) we get 

(48a)' B = Q^d -«3 ^ 

If we let 

(48b) a„=(I-rtS„‘b 

we have from (48a) and (48b) 

(49) E.= Q„a„Q„-f 

It will now be of interest to find the covariance matrices 
involving the matrices Z, Y, and e^ given by equations (9), 
(10), and (11), respectively. 

Prom (4) and (9) 



(50) Z'Z=B»GB 



Prom (4), (9), and (10) 

(51) Z'Y = B»G(B - Dg) . 

Prom (4), (9), and (11) 

(52) Z*o = B*G(A - B) 

O 
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Prom (4) and (lO) 

(53) Y»Y = (B* - - Dg) ^ 

Prom (4), (10), and (ll) 

(54) Y«e - (B» - Bg)G(A - B) 

Prom (4) and (ll) 

(55) e’e = (A - B*)G(^ - B)- 

Since B is symmetrical, it can be proved that B, 

(B - Bg), and (A - Bi) , together with G, are all commutative 

for multiplication. Therefore we have from (50) through (55) 

respectively 

' (53) Z*Z = GB^ 

(59) ?r*Y = GB(B - Dg) 

(60) Z*0 =: GB(A - B) 

(61) Y'Y=G(B-Dg)^ 

(62) Y*e = G(B - Dg)(^ - B) 

(63) 0»e = G(A - B)^ 

Now from (32) and (49) 

(64) B - = (Q„ I)f 

. 4 .- 

Prom (29) and (49) 

(?5) A- B = (I a„ Q„')f • 

o 
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Wow from (49) 



// 



(6(,) 

Prom ( 49 ) and (64) 

(67) B(B - D3) = i)Q,' 

Prom ( 49 ) and (65) 

;,{ 68 ) B(A-B) 

Prom ( 64 ) 

(69) (B -- Dg)2 

Prom ( 64 ) and (65) 

( 70 ) (B - Dg)(4’~ B) = -(Q„i "■ l)Qra’ 

Prom ( 65 ) 

(71) (A - b)2 = (I 1 e„, 

We may now write the covariance matrices given by (58) 
though ( 63 ) as functions of & and of the corresponding 
right sides of equations (66) through (71). We have from 
these two sets of equations and equation (23) 

(72) ’Z'Z 

(73) 2't =Q.^«,^aJa„-^-f^l)Q„' 

(74) z-a =Q„s„a„(i - a„)0„* 



= Qm 

mm m m 



“ - 2 n’«m' + 



/2 



(V>) Y'l = (Q„ a^(d„ - 2(^y I)Q„- + 0)f2 

(76) Y-e ■= -(Q,„ I)Q^< + B)t^ 

(77) B-e = (G + Q,,s,, aja„ . 2I)Q„')f^ 

To further simplify the notation, we let 

(78) e = z»z 

(70) E = Q„ a„ Q„- 

Then substituting (78) and (79) in (72) through (77) 

(80) Z»Y = g - f(f - 1)E 



(82) Z»e = -g + f^B 

(83) Y’Y =• g - 2f(f - 1)E + (f - 1)^G 
, (84) Y*e = -g + f(2f - 1)E - f(f-l)<> 



(85) e'e = g - 2f^^E + f^G 

We may now regard g as the rank m approximation to G, 
the covariance matrix of W, which is the rescaled data matrix 
X., Then if we let 



(86) A = S„i(I 



we have from (72), (78), and (86) 
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(87) g = AA» 



and A may "be regarded as the factor loading matrix. 

The matrix A may be compared with the conventional basic 
structure form 



(88) A = Q„ 

where A is the basic structure solution in our generalized 
factor ana].ysis with variable scaling and loss function par- 
ameter% special cases of which have been shown to closely re- 
semble, if not be identical to, a number of current factor 
ai^ytic models. 

V/e may now evaluate the Z, Y, and e matrices directly. 
We let the basic stinicture of W be given by 

(89) W 



[■ 



I'm » Ps 



1 . 

m 

0 



0 

1 

s 



Q ' 

•^m 



Prom (9), (49), and (89) 

(90) Z = V ^m^ '^ 

Prom (10), (48b), (64), and (89) 



(91) 



y = 



P (^i ‘ 

m' m 



- f 



m 



m 



- P. 






Prom (12), (48b), (65), and (89) 

(92) e = (P„ 8;^ Q^-)f 

It is clear from (90) that Z is of rank m, while from (91) 
and ( 92 ) we see that Y and e are of rank n if X is 
basic. It is also clear from (89) that 

(93) P„ = 'W„SJ* 



We may regard P^^^ as yet another type of score matrix. 



To suTiunarize, we have then five types of score matrices 
as follows: 

(1) The W matrix is the X matrix rescaled so that 
the B matrix calculated from the roots and vectors, as in- 
dicated by (49) » has the diagonal 

(94) Dg = (f - 1) I 
where f is given by (44b). 

(2) The Z matrix is the rank m approximation to the 
W matrix by the B transformation on W. 

(3) The Y matrix is the best least square approximation 
to the W matrix calculated from the Z matrix where each 

Y vector is independent of the corresponding W vector. It 
is important to note that B has been- determined so as to 
optimize this approxima.tion in the least square sense. 

(4) The e matrix is the matrix of residuals between 
V/ and Y, and B is determined so as to minimize the trace 
of its product moment. 

(5) The Pjjj matrix is analogous to the principal axis 
factor score matrix. It is in fact the principal axis factor 
acore matrix for the rescaled data matrix V/. The width of 
this matrix is obviously only m whereas the other four ma- 
trices are all of width n. This matrix is perhaps of most 
practical interest among the five types. 

To return to the covariance matrices (80) through (85), 
to date their properties have been only briefly investigated. 
Perhaps the most that can be said as of now concerns the traces 
of Y’Y, Y’e, and e’e. By straightforward but somewhat 
tedious manipulation, it can be proved that 



(95) tr Y*Y. = n - d^fn 



(96) -tr Y*e = 0 

(97) tr e *e = <^fn 

It is clear therefore from (5), (41), (95), (96), and (97) 
that 

(98) tr Y'Y + tr e'e = tr G 
which is as it should be. 

We may now consider the case where n = m and R is 
basic. In this case vie must return to equations (31) (32). 

Equation (31) becomes simply 



(99) B = fl - a 



From (32) and (99) 



(100) B - Dg = I - QS“^ Q’a 



From (5) and (23) 



(101) QS~^t = D-1 R-1 



Prom (lOO) and (lOl) 



(102) 0 = 1- D"^ D D~^a 



From (102) 



From (41) and (103) 



(103) = D _i a 
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Prom (103) and (IO4) 



(105) 




Equation (105) is the same result obtained by Harris’s scale 
free modification of Guttraan’s image analysis model except for 
the scaling factor • 

If now R is of rank m where ra < n, we have from (44) 



and from (34) and (106) 
(107) a = 0 

Prom (3l)» (106), and (107) 



(108) B = Q„ Q„» 



n 



m m n - m 



Prom (32), (106), and (IO8) 



( 110 ) 



m 



n “ m 



I = D, 



n 



Q Q ’ n - m 
ram 



Prom (110) 



(111) Dq Q . = I I 
^ral^ra 

It. is seen therefore from (ill) that if the R matrix is not 
basic and ra is taken as the rank of the matrix, then the 
scaling of R must be such that the row vectors of the ver- 
tical basic orthonorraal are all of equal length. That 

this is always possible has not been proved, although it has 
been proved for special cases. In any event, it appears to 
date that this case is of more theoretical than practical 



interest. 
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Of mors general and practical interest is the case where 
ra is less than the rank of R, whether or not R is basic. 
Thus far we have ignored the question of how to determine m. 
This of course is the question of hov/ many factors to solve for 
and although many answers have been proposed, none of them has 
gained universal acceptance. The various tests of statistical 
significance leave much to be desired. In our opinion, these 
tests are founded on assumptions that are irrelevant or in- 
appropriate for most important practical situations. 

One of the simplest and most appealing criteria is the 
one proposed by Kaiser^and accepted by many as a good rule-of- 
thumb procedure. It is the number of roots greater than unity 
in the correlation matrix. This rule may be generalized to 
the number of roots greater than unity in the matrix G as 
defined in the foregoing developments. The smallest root in 

of ( 23 ) would, according to this criterion, be greater than 
unity and the largest root in S of (23) should not exceed 
unity. As a first approximation one might start with the 
Kaiser criterion, namely, the number of roots greater than 
unity in the correlation matrix. 

It may be of interest to examine the generalization of 
the Kaiser critex’ion to the case of the G- matrix. First lot 
us return to equation (97). Prom this we get 

(112) iiL^=<Af 



Prom ( 98 ) and (112) we get 




tr Y»Y 



(113) 



n 



1 - olf 



Prom (44a) and (44b) 



(114) c^f = 



n(n - tr 6 ) 
' ni 



(n - tr + (n - ra)** 

To gain better insight into the properties of (114), we let 
tr Sm 



(115) fi = 



ra 



That is, /I is simply the mean of the m values in We let 

( 116 ) 



ra 






so that D is simply the variance of the values in S . It 

m 

can be proved that 

(117) tr 8^-1 = ”( 1 + ^ + 6) . 

where 6 is a positive quantity which tends to increase as 
2 

V increases. We let 

(118) e 

2 

so that also increases as ■)> increases. Prom (114), 
(115), (117), and (118) 



(119) o(f = 



n(n -- mM)M* 



m(n ~ m^) -i (n - m)^jfX + m(n - mji) <f 

We let 

(120) r = 



n 



Prom (119) and (120) 



(121) rtf = 



<5 -/‘>M 



1 + (g - 2)p + r 
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Prom (113), (120), and (121) 

(122) (M - 1)^ + r 




Prom (112), (115), and (122) 




(n - tr S^) tr S„ 

(tr _ mf + (n - tr 5^^^)'?] 



where 




Since for a given m the ratio in (123) should he as small 
as possible, the nvunerator term should he small and the de- 
nominator term large. As all the roots in approach equal- 

ity, .'’'j in (123) approaches zero. Per a given m the ratio 
in (123) v/ould then approach a maximijm if these roots were the 
m largest roots of G. That this would also he the case if 
the ra largest roots were no'fc. all equal is also prohahly true 
hut we have not found a mathematical proof. 

Perhaps b.^ argument for choosing ra as the number of 
roots of G greater than unity is suggested by (92). Neg- 
lecting the scaling factor f on the right, the right hand 
terra in the parentheses is precisely the basic structure form 
of the residual matrix in the principal axis type solution. 

The terra on the left involves the remaining parts of the basic 
orthonorraals of P and Q of W in (89). The basic diagonal 
of this matrix is wish to suppress the component 

of the e matrix, given by the first term orfthe right of (92), 
as much as possible. In particular, we wish to guarantee that 



(124a) 1 



(124b) 

where (124b) implies that the inequality holds for each of 
the diagonal elements. Obviously, because of (41a) and (44a), 
a necessary and sufficient condition for both (124a) and 
(I24b) to hold is that consist of the roots of G greater 

than unity. 

To see how we might calculate the D scaling matrix, we 

proceed as follows. We let 
. (125) ^ DU 

; (126) cr= V(V»RV)“^V 

Prom (l) through (5), and (20), (125), and (126) 

(127) S = D^Vd"*^ 

U28) GS ;= D'^^RD 

(129) tr 6 = tr(V'RV)^ 

(;130) ,trS“^ =s tr(V*RV)“^ 

Prom (31), (127), and (128) 

- (131) B = D“VRDf - D"*^(T D”^a 

Prom (7), (.32), and (131) 

(132) D^(f - 1) = D^^ D^f - T)^ a 

Prom (48) and (132) 

(132a) D^(f - 1) = (D^jj - <XD^)f 

Let 

.(133) P = 



Prom (132) and (133) 



(134) " Pl)“^o«.D^ 



<TH 



Prom (41) and (133) 

. (135)' n - tr(D^n - P)"^o«D^ = 0 

Prom (44b) and (133) 



(136) P = 



m -oitr 
n 



Let 



•(137) 25m = ®Q Q • 

: ;. ■ . m m : 

Prom (27), (128), (134), and (.137) 



(138) = (Djj^ - PD'^rtD 



<r 



and from (135) and (138) 



(13S) n - •tr(Djjj - PI)"^ckD^ = 0 ' 



Without loss of generality, assume that the values are 

in descending order of magnitude. It can be proved that one 
and only one P exists lying between each of the (n - 1) 
adjacent pairs of which satisfies (139). But (139) has 

n roots. Because of the left side of (138) and since is 
positive definite, the matrix in parentheses must also be 
positive definite. This obviously cannot be the case for any 
P greater than the smallest value in The remaining root 

must lie outside the range of the values in It cannot be 

greater than the largest value in since any such value 



could not satisfy (139). It must therefore be smaller than 
the smallest value in and can therefore satisfy both (138) 

and (139). Also because of (136), P must be greater than 
zero. 

Next we may write from (23) and (25) 

(140) u = Gu(u*Gu)“®h 

where h is any square orthonormal matrix. From (5) and (125) 
(141) V'= D^aV(V*RV)"^h 

Equation (141), together with (126), (129)* (130), (134), and 
(135), provides the basis for the suggested iterative computing 
algorithms to solve for the D scaling matrix, which in turn 
provides the basis for all other computations. We let 



(142) 


iW 


= K iV 


(143) 


i* 


if = iV i« 


(144) 


i^ 


= iV 


(145) 




= n - tr(^t) 


(146) 


n - 


D j(r) = 0 



(147) ® 

(148) 



o 



We begin with the basic structure solution of R given by 



(149) 







where m is the number of roots in R greater than, \mity. 
We let 



(150) qV = 

Then 

(151) V 
(158) 



(153) o'< = ° ~ 



n - mi 



(154) n - tr((j) 



in la 



=^mr m 



(155) oD- = (B . ^ Va-l 






m ra m 



(156) 1 V = oD= Q„ d ^ 

^ in jn 

Beginning then with i = 1 and using (156), one would iterate 
with the computations (142) through (148) until hopefully D 
stabilized. 

It is possible that better procedures for solving for D 
could be formulated. For example, in equation (128) G3 is 
symmetric. V/e might solve iteratively for the solution for D 
which in the least square sense makes each approximation to G3 
most nearly symmetric. For example, let 

(157) M = R 



O 



Let 



(158) M‘ - M = e 

(159) D1 = Vp 

(160) - m*-m;) = 0 

where the dot means elemental multiplication. It can be 
shown that the corresponding to the smallest root 

in (159) is proportional to the D which minimizes the 
trace of in (158). What methods will be most efficient 

in solving for D must await actual computational research. 



ADDENDUM 

Recently Joreskog^ has presented a model for Image 
Fac.tor Analysis, together with computational procedures 
for estimating the parameters of the model. The relation- 
ships among the objectives and end results of his approach 
and ours is not yet clear. 



^ Joreskog, K. G. Efficient estimation in image factor 
analysis. Psychometrika , 34, 51-75, 1969. 
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